guanjiawei.aiJiawei Guan's Personal Site

Tags:

Inference Optimization

1 post

AI Inference Optimization Infrastructure Reflections

When a Model Is 5× Faster, It’s No Longer the Same Model

Gemini 3.5 Flash and Zhipu GLM-5.1's 400 token/s mode show that crossing ~5× inference speed unlocks a different product category, not just faster answers.

May 22, 20266 min read